首页> 外文OA文献 >Generation of Compound Words in Statistical Machine Translation into Compounding Languages
【2h】

Generation of Compound Words in Statistical Machine Translation into Compounding Languages

机译:统计机器翻译成复合语言中复合词的生成

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In this article we investigate statistical machine translation (SMT) into Germanic languages, with a focus on compound processing. Our main goal is to enable the generation of novel compounds that have not been seen in the training data. We adopt a split-merge strategy, where compounds are split before training the SMT system, and merged after the translation step. This approach reduces sparsity in the training data, but runs the risk of placing translations of compound parts in non-consecutive positions. It also requires a postprocessing step of compound merging, where compounds are reconstructed in the translation output. We present a method for increasing the chances that components that should be merged are translated into contiguous positions and in the right order and show that it can lead to improvements both by direct inspection and in terms of standard translation evaluation metrics. We also propose several new methods for compound merging, based on heuristics and machine learning, which outperform previously suggested algorithms. These methods can produce novel compounds and a translation with at least the same overall quality as the baseline. For all subtasks we show that it is useful to include part-of-speech based information in the translation process, in order to handle compounds.
机译:在本文中,我们研究将统计机器翻译(SMT)转换为日耳曼语的语言,重点是复合处理。我们的主要目标是能够生成训练数据中未发现的新型化合物。我们采用拆分合并策略,其中化合物在训练SMT系统之前先进行拆分,然后在翻译步骤之后进行合并。这种方法减少了训练数据的稀疏性,但是冒着将复合部分的翻译放置在非连续位置的风险。它还需要化合物合并的后处理步骤,其中在翻译输出中重构化合物。我们提出了一种方法,用于增加应合并的组件以正确的顺序转换为连续位置的机会,并表明通过直接检查和标准转换评估指标,它都可以带来改进。我们还基于启发式算法和机器学习提出了几种新的化合物合并方法,这些方法优于以前提出的算法。这些方法可以产生新颖的化合物,并且翻译的总体质量至少与基线相同。对于所有子任务,我们表明在翻译过程中包括基于词性的信息是有用的,以便处理复合词。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号